A novel CNN architecture for accurate early detection and classification of Alzheimer’s disease using MRI data

Alzheimer’s disease (AD) is a debilitating neurodegenerative disorder that requires accurate diagnosis for effective management and treatment. In this article, we propose an architecture for a convolutional neural network (CNN) that utilizes magnetic resonance imaging (MRI) data from the Alzheimer’s disease Neuroimaging Initiative (ADNI) dataset to categorize AD. The network employs two separate CNN models, each with distinct filter sizes and pooling layers, which are concatenated in a classification layer. The multi-class problem is addressed across three, four, and five categories. The proposed CNN architecture achieves exceptional accuracies of 99.43%, 99.57%, and 99.13%, respectively. These high accuracies demonstrate the efficacy of the network in capturing and discerning relevant features from MRI images, enabling precise classification of AD subtypes and stages. The network architecture leverages the hierarchical nature of convolutional layers, pooling layers, and fully connected layers to extract both local and global patterns from the data, facilitating accurate discrimination between different AD categories. Accurate classification of AD carries significant clinical implications, including early detection, personalized treatment planning, disease monitoring, and prognostic assessment. The reported accuracy underscores the potential of the proposed CNN architecture to assist medical professionals and researchers in making precise and informed judgments regarding AD patients.


Related work
In recent years, there has been a surge in the application of deep learning techniques to categorize Alzheimer's disease (AD) using data from multimodal brain imaging.Leveraging the rich data provided by numerous imaging modalities, several research studies have proposed enhanced deep convolutional neural networks (CNNs) for AD categorization.
For predicting MCI conversion, the authors of 23 developed a domain transfer learning-based model.They utilized various modalities, employing target and auxiliary domain data samples.Following experimental procedures, they employed domain transfer learning, achieving a prediction accuracy of 79.40%.Reference 24 introduced a robust deep-learning methodology using MRI and PET modalities.They incorporated a dropout strategy to enhance performance in terms of categorization.Additionally, they applied the deep learning framework's multi-task learning method, assessing variations with and without dropout.The dropout technique yielded experimental findings indicating a 5.9% improvement.In 25 , the authors presented two CNN-based models, evaluating volumetric and multi-view CNNs in classification tests and integrating multi-resolution filtering, which directly influenced classification outcomes.
The authors of 26 proposed a 2D CNN method based on ResNet50, incorporating multiple batch normalization and activation algorithms to classify brain slices into three classes: NC, MCI, and AD.The proposed model achieved an accuracy rate of 99.82%.To identify specific local brain morphological traits essential for AD diagnosis, another study 27 developed a SegNet-based deep learning approach, finding that employing a deep learning technique and a pre-trained model significantly enhanced classifier performance.In 28 , a 3D CNN was designed to distinguish between AD and CN using resting-state fMRI images.Meanwhile, Çelebi et al. 29 utilized morphometric images from Tensor-Based Morphometry (TBM) preprocessing of MRI data.Their study employed the deep, dense block-based Xception architecture-based DL method, achieving high accuracy in early-stage Alzheimer's disease diagnosis.However, this study did not address issues such as dataset variability, overfitting, and challenges with TBM image feature extraction.
To diagnose Alzheimer's disease, Baglat et al. 30 proposed hybrid machine learning-based models using SVM, Random Forest, and logistic regression.Their models utilized MRI patient scans from the OASIS dataset.Salehi et al. 's 31 analysis emphasized that employing a deep learning approach would enhance early-stage Alzheimer's disease forecasting.They utilized the OASIS and ADNI datasets, respectively.Fu'adah et al. 20 introduced an AlexNet-based CNN classification model, achieving 95% accuracy using a collection of MRI images related to Alzheimer's.
www.nature.com/scientificreports/Murugan et al. 32 presented a CNN model for Alzheimer's disease recognition.Their proposed model consisted of two convolutional layers, one max-pooling layer, and four dementia network blocks, achieving an accuracy of 95.23% using the ADNI MRI image dataset.Salehi et al., in another study, employed MRI scans to diagnose Alzheimer's disease using a CNN, achieving an average accuracy of 84.83%.Concurrently, Noh et al. 33 proposed a 3D-CNN-LSTM model, utilizing extractors for spatial and temporal features and achieving high accuracy results of 96.43%, 95.71%, and 91.43%.
Rallabandi et al. 34 presented a system for early diagnosis and categorization of AD and MCI in older cognitively normal individuals, employing the ADNI database.Their model achieved a 75% accuracy across various machine learning techniques.Furthermore, Odusami et al. 21introduced a pre-trained CNN hybrid model, employing deep feature concatenation, weight randomization, and gradient-weighted class activation mapping to enhance Alzheimer's disease identification.Bamber et al. 35 developed a CNN using a shallow convolution layer for Alzheimer's disease classification in medical image patches, achieving an accuracy of 98%.Additionally, Akter et al. 's AlzheimerNet, a modified InceptionV3 model 36 , demonstrated outstanding accuracy in Alzheimer's disease stage classification from brain MRIs, surpassing traditional methods with a test accuracy of 98.67%.

Materials
This section demonstrates the data source used to train a CNN model to recognize AD phases and the preprocessing image methods applied to the dataset.

Description of the AD dataset
On the internet, numerous datasets can be used to classify AD.However, some of the CSV-formatted AD datasets are inappropriate for this study.Access to datasets from dedicated organizations such as Kaggle, ADNI 37 , and OASIS 38 is available for research and educational purposes.The MRI ADNI dataset contains the MRI scans utilized in this study.The Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset includes patients with Alzheimer's disease, mild cognitive impairment (MCI), and healthy controls.The ADNI dataset encompasses genetic information, cognitive tests, blood and CSF biomarkers, MRI and PET images, as well as clinical information.Table 1 presents statistical information regarding the MRI ADNI dataset.
This data consists of 1296 T1-weighted MRI scans.Each scan produces a 3D picture of the brain with a resolution of 1.5 mm isotropic voxels.As seen in Fig. 1, the scans are classified into one of five classes: CN patients, EMCI, LMCI, AD, and MCI.www.nature.com/scientificreports/

Data preprocessing
The ADNI dataset was chosen for this study based on its suitability for our research objectives.The ADNI dataset, contributed by the Alzheimer's Disease Neuroimaging Initiative (ADNI), represents a globally collaborative research effort aimed at developing and validating neuroimaging tools to track the progression of Alzheimer's disease (AD).This dataset comprises data collected from ADNI Imaging Centers, located in clinics and medical institutions across the United States and other parts of the world.Prior to its public release, the data underwent processing and preparation by ADNI-funded MRI Analysis Laboratories.To optimize the quality and consistency of the images for analysis, the dataset's images underwent essential pre-processing steps.As illustrated in Fig. 2, these steps included: • Scaling: Uniformly resizing all images to 224 pixels in both width and height.
• Augmentation: Enhancing the dataset's diversity and mitigating overfitting by employing data augmentation techniques, as referenced in 39,40 .
To address the issue of imbalanced classes within the dataset, as visualized in Fig. 1, we employed the ADASYN technique to generate synthetic data for underrepresented classes.

Data augmentation
To minimize overfitting during neural network training, data augmentation is employed.This technique involves making class-preserving changes to individual data, artificially expanding the dataset 41 .Using methods that ensure replicability allows for the generation of new samples without altering the image's semantic meaning.Given the challenges of manually locating newly labeled photos in the medical field and the limited availability of expert knowledge, data augmentation emerges as a reliable method to expand the dataset.
For our work, we devised an image augmentation method that incorporates cropping, scaling, flipping, and adjusting the brightness and contrast of the images.

ADASYN technique for balancing the AD dataset
There are two standard resampling methods: oversampling and under sampling.Oversampling creates samples for the minority class, while under sampling reduces samples from the majority class.In the proposed strategy, we employ an oversampling technique called ADASYN 42 .ADASYN stands for Adaptive Synthetic Sampling Approach, a technique in machine learning designed to address class imbalance in datasets.Like SMOTE (Synthetic Minority Oversampling Technique), ADASYN aims to enhance the performance of classification models by artificially increasing the number of data points in the minority class.However, ADASYN employs a more sophisticated approach than SMOTE.
The core concept of ADASYN involves using weighted distributions for different minority-class examples based on the difficulty the learner faces in understanding them.This creates more comprehensive data for the more challenging minority-class instances compared to the easier-to-understand minority-class examples.Thus, the ADASYN approach enhances understanding of data dispersion in two ways: it mitigates bias stemming from class imbalance and adaptively focuses classification inference on complex samples.As depicted in Fig. 3, to better represent the minority classes, ADASYN introduces additional synthetic examples using nearest-neighbor methods, whereas SMOTE merely duplicates existing minority class points, potentially leading to overfitting.Conversely, ADASYN strategically generates new data points in areas where they're most needed, potentially yielding improved performance.Therefore, ADASYN outperforms SMOTE in handling complex data and reducing overfitting.

Data splitting
In this approach, the dataset was divided into three subsets.The training and validation sets are used to evaluate model performance by training on data, while the test data subset is employed for model prediction.As depicted in Fig. 4, the data was randomly allocated, with 90% for training and 10% for testing.Subsequently, crossvalidation was applied solely to the training data.This process involves dividing the data into multiple subsets, evaluating each subset as a validation set, and then averaging the outcomes.Such an approach helps alleviate potential dataset bias.The validation dataset assists in selecting hyper-tuning parameters, such as regularization

The proposed CNN model description
To process diverse patient data, we are constructing a network comprising two separate CNN models concatenated in a classification layer, as illustrated in Fig. 5.A 224 × 224 × 3 tensor, representing the temporal dimension and the axes (x, y, and z), serves as the input for the network.The first CNN model is initiated with two convolutional layers, each housing 16 filters of size 3 × 3.
These filters extract local features from the input images.Subsequently, 2 × 2 max-pooling layers with a stride of 2 are applied to down sample the feature maps and capture pivotal information.The subsequent two convolutional layers each incorporate 64 filters, enhancing the representation of higher-level features.Another round of max-pooling is executed to reduce spatial dimensions.Following this, a single convolutional layer with 256 filters of size 3 × 3 is introduced to capture intricate patterns.To combat overfitting, a dropout layer with a 20% rate is incorporated, and batch normalization is employed to normalize activations, ensuring improved training stability.Finally, a fully connected layer with 128 neurons is appended to glean global insights from the flattened feature maps.
The second CNN model follows a comparable structure but with distinct filter sizes.It commences with two convolutional layers, each comprising 32 filters of size 5 × 5. Subsequently, 2 × 2 max-pooling layers are applied with a stride of 2. The ensuing two convolutional layers each contain 128 filters of size 5 × 5.A subsequent round of max-pooling is executed for spatial dimension reduction.This is succeeded by a convolutional layer encompassing 512 filters of size 5 × 5. Similarly, a 20% dropout layer is employed to prevent overfitting, and batch normalization is integrated for enhanced training stability.Ultimately, a fully connected layer with 128 neurons is appended to extract global insights from the feature maps.
Prediction, denoting the probability that the input belongs to any of the five classes, is generated by concatenating features extracted from each CNN network and processing the outcomes on a Fully Connected network.The predicted class is then determined based on the highest value.Table 2 furnishes a comprehensive description of the network architecture, detailing each convolutional layer's operations, size, filter count, and output.Additionally, the parameters for each layer are enumerated.Each parameter is trainable, integrated into the backpropagation process, while Table 3 enumerates the CNN model's hyperparameters created.1. Accuracy: Accuracy represents the percentage of actual forecasts that were correctly predicted.Generally, values above 80% are considered good, while values exceeding 90% are deemed excellent.This metric is determined by the following expressions 43 .
where, TP, TN, FN, FP are True Positive, True Negative, False Negative, and False Positive values, respectively.

Precision:
The following equation is used to compute precision, which is defined as the ratio of accurate optimistic forecasts to all optimistic predictions 46 .In general, precision values over 80% are regarded as satisfactory.
(    43 .Acceptable recall values typically range from 70 to 90%.The following equation is used to compute the recall: 4. F1-score: The F1 score is remarkable in that it provides a distinct value for each class label 43 .Use the following calculation to determine the F1-score.

Balanced accuracy:
It is calculated by averaging the true positive rate (TPR) and true negative rate (TNR).
The TPR represents the ratio of positive to adverse events accurately identified, while the TNR signifies the ratio of negative to positive events 44 .

Matthews Correlation Coefficient (MCC):
The MCC is a more complex metric that considers the imbalance between positive and negative examples in a dataset.If one class significantly outweighs the other in occurrences, the metric can become uneven 45 .The MCC is calculated as follows:

Model development and training
In our work, we trained and validated the classifier using open-source software: Python 3.0 and the Google Collaboratory Pro platform 46 , equipped with a GPU: 1xTesla

Experiments and results
In the following section, we delve deeply into the steps of the experiment, present the results, and compare them with previous findings.As depicted in Fig. 2, after loading the ADNI MRI data, we augmented the images and utilized the ADASYN approach to address data imbalance.The dataset size expanded to 3,000 images post ADASYN application.Subsequently, we divided the data into three sets based on the proportions illustrated in Fig. 3: training, validation, and test sets.Ultimately, we used the training data to train the proposed model.
The proposed model comprises two distinct CNNs merged at the classification stage.We applied the 5-way multiclass MRI dataset to each network individually.Performance evaluation employed metrics such as accuracy, recall, precision, balanced accuracy, Matthew's correlation coefficient, and loss function.These individual network performances were then juxtaposed with the combined CNN performance, as outlined in Table 4.
Tables 5, 6, and 7 present the classification performance results of these CNN networks, focusing on metrics like recall, precision, f1-scores, and support, where 'support' denotes the number of samples.
As you can see, reducing the size of a filter can lead to improved classification results.Specifically, CNN2, which employs a 5 × 5 filter size, needs to utilize twice the number of filters present in CNN1 (which uses a 3 × 3 filter size) to achieve a comparable accuracy to CNN1.Furthermore, when the two networks are combined, the resultant network exhibits higher accuracy than either of the individual networks.This improvement arises because the two networks complement one another, offering different perspectives on the data.
To evaluate the effectiveness of this approach across various classification tasks, we applied the combined network to datasets, providing experimental results for a benchmark five-way multiclass classification problem 16 , a benchmark four-way multiclass classification problem 28 , and a benchmark three-way classification problem 47 .
In Fig. 6, we initially display graphs contrasting the proposed model's training accuracy against validation accuracy, as well as training loss versus validation loss, for the three-way, four-way, and five-way multiclass problems.Table 8 juxtaposes the performance of the proposed model across the aforementioned multiclass problems.www.nature.com/scientificreports/

Confusion matrix
It is employed to evaluate and compute various classification model metrics.It gives the numerical breakdown of a model's predictions during the testing phase 43 .A Confusion matrix for the proposed model was developed, as seen in Figs.7 and 8, to evaluate how well the suggested network performed on each class in the test data.Additionally, Tables 7, 9, and 10 provide specifics regarding the class classification report of the proposed model based on precision, recall, and F1-score.
Figure 7c shows that one subject of CN was misclassified as EMCI, and another was misclassified as MCI in the case of five multiclass classifications.This indicated an influential model because, in medical diagnosis, screening a person as diseased is preferred over eliminating a diseased person by falsely predicting a negative.As dedicated in Fig. 8, one subject of EMCI was incorrectly diagnosed with AD in four multiclass classifications.One EMCI was misclassified as AD in a three-way multiclass.
For the three-way, four-way, and five-way multiclass classifications, the suggested model yielded average accuracy values of 99.43%, 99.57%, and 99.3%, respectively.Additionally, as depicted in Fig. 9, the suggested model was examined to determine whether the predicted label matched the actual label.

GRAD-CAM analysis
In the ongoing quest to understand and harness the power of deep learning, a crucial challenge lies in making these complex neural networks more interpretable.This is especially critical in applications like medical imaging, where trust and understanding are paramount.Deep learning can be shown in action with Gradient Weighted Class Activation Mapping (Grad-CAM), developed by Selvaraju et al. 48.This ingenious technique acts as a magnifying glass for deep neural networks, providing a visual representation of their inner workings.It's like peeking behind the curtain to see what these algorithms are focusing on when they analyze data.The MRI scan serves as the input for the suggested model, which is used as a detection technique.Grad-CAM is applied to the last convolution layer of the two proposed CNN models before concatenation has been used to get the expected label.The feature map for the suggested network is extracted in this case using the Grad-CAM technique.The heat map shows the image region that is essential for determining the target class as a visual depiction of a suggested network.Furthermore, the significance of every CNN model in decision-making as well as the impact of varying the size and quantity of filters in each model can be determined with this method.The heatmaps and visualizations created by applying the GRAD-CAM algorithm to MRI scan images of an AD, CN, and MCI are shown in Fig. 10.This visual evidence not only enhances our understanding of the model's predictions but also paves the way for validating Alzheimer's diagnoses with greater confidence.www.nature.com/scientificreports/

ROC curve analysis
The proposed model's performance is evaluated by computing the AUC (Area Under Curve) and ROC (Receiver Operating Characteristics Curve) values 49 .The single class vs. rest method is used for multiclass classification.ROC curves are built with 1-specificity (false positive rate) as the x-axis and sensitivity (true positive rate) as the y-axis.Calculating the area under the ROC curve yields the AUC score.The AUC value ranges from 0 to 1.The model's performance decreases as the value gets closer to 0. Likewise, the more closely the value approaches 1, the more well the model works.Figure 10 displays the ROC curves for the first, second, and suggested CNN models across the five classes.Taking into consideration that Classes 0, 1, 2, 3, 4, and 5 refer to CN, MCI, AD, LMCI, and EMCI, respectively.By examining Fig. 11, it can be observed the proposed model significantly improved the AUC values for all classes of Alzheimer's disease.The AUC value of the class CN is 0.9992, MCI is 0.9707, AD is 1, LMCI is 1, and EMCI is 0.9737.Whereas the AUC values when applying proposed CNN1 were as follows the class CN is 0.9978, MCI is 0.9956, AD is 0.9950, LMCI is 1, and EMCI is 0.9997.while the AUC values when applying proposed CNN2 were 0.9994 for CN, 0.9818 for MCI, 0.9758 for AD, 1 for LMCI, and 0.9831 for EMCI.Therefore, the proposed model is a more accurate and reliable method for diagnosing Alzheimer's disease.

Wilcoxon signed-rank test
To ensure that the results were not merely due to random chance, a significance statistical analysis (S) was conducted.The p-values for each model were computed, and the researchers utilized the Wilcoxon signed-rank test for this purpose.The Wilcoxon signed-rank test is commonly employed when comparing two non-parametric variables.Through this test, two independent samples are contrasted to assess pairwise differences across multiple observations from a single dataset.The outcome indicates whether there's a distinction in their population mean ranks.The p-values for the pairwise comparisons of the models 50,51 are detailed in Table 11.Compared to the   www.nature.com/scientificreports/other models, the suggested model exhibited superior performance.In essence, the proposed model significantly outperformed the other four models, as indicated by the p-value difference between the suggested model and the others being less than 0.05.
Numerous studies have employed various methodologies to categorize the stages of AD.As shown in Table 12, we compared the performance of the proposed system with various models discussed in the literature review.
Clearly, the recommended approach yielded the best results in terms of accuracy and performed exceptionally well in 3-way, 4-way, and 5-way multiclass classification problems.Additionally, the results underscore the importance of concatenating multiple CNN models in the classification layer to enhance the model's discriminative ability.Compared to single-model techniques, our method excels in capturing AD-related patterns by integrating complementary data from different CNNs.
The proposed method offers several advantages over traditional methods for early AD detection:

Conclusion
In summary, this research proposes a new method for early detection of Alzheimer's disease (AD) using magnetic resonance imaging (MRI) data.The suggested approach employs two convolutional neural networks (CNNs) and combines their outputs by concatenating them in a classification layer.The objective is to capture various spatial and structural features of the brain, facilitating a comprehensive analysis of AD-related patterns.The efficacy of our approach is demonstrated through experimental results on the ADNI dataset, as compared to findings from prior research, as depicted in Figs. 12, 13, and 14.For the 3-way, 4-way, and 5-way classification tasks, we achieved notably high accuracy rates of 99.43%, 99.57%, and 99.13%, respectively.Overall, this study advances the field of AD detection by introducing an innovative approach with promising accuracy results.The proposed method has the potential to assist doctors and researchers in earlier AD diagnosis, paving the way for proactive treatments and improved patient outcomes.Future endeavors will focus on validating the method with larger datasets, exploring its applicability in clinical settings, and integrating additional data modalities to enhance accuracy

Figure 2 .
Figure 2. The methodology of the proposed work.

Figure 3 .
Figure 3. Class distribution of the MRI dataset after oversampling.

Figure 7 .
Figure 7. Confusion matrix of proposed model on test data (a) CNN1; (b) CNN2; (c) the overall developed CNN.

Figure 9 .Figure 10 .Figure 11 .
Figure 9. Examining the predicted label matched the real label or not.

Table 1 .
Key statistics for each clinical diagnosis.

Table 2 .
The proposed CNN parameter.

Table 4 .
The performance of the first developed CNN, the second developed CNN and the proposed model for test data.

Table 5 .
The result of Precision, Recall, and F1-Score for each class when Appling the first developed CNN only on the test data to classify it in to 5 categories.

Table 6 .
The result of Precision, Recall, and F1-Score for each class when applied the second developed CNN only on the test data to classify it in to 5 categories.

Table 7 .
The result of Precision, Recall, and F1-Score for each class when applied the proposed CNN on the test data to classify it in to 5 categories.

Table 9 .
The result of Precision, Recall, and F1-Score for each class when applied the proposed CNN on the test data to classify it in to 3 categories.

Table 10 .
The result of Precision, Recall, and F1-Score for each class when applied the proposed CNN on the test data to classify it in to 4 categories.